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Abstract 

Writers usually need iterations of revisions 
and edits during their writings. To bet- 
ter understand the process of rewriting, 
we need to know what has changed be- 
tween the revisions. Prior work mainly fo- 
cuses on detecting corrections within sen- 
tences, which is at the level of words 
or phrases. This paper proposes to de- 
tect revision changes at the sentence level. 
Looking at revisions at a higher level al- 
lows us to have a different understanding 
of the revision process. This paper also 
proposes an approach to automatically de- 
tect sentence revision changes. The pro- 
posed approach shows high accuracy in an 
evaluation using first and final draft essays 
from an undergraduate writing class. 

1 Introduction 

Rewriting is considered to be an important process 
during writing. However, conducting successful 
rewriting is not an easy task, especially for novice 
writers. Instructors work hard on providing sug- 
gestions for rewriting (Wells et al., 2013), but usu- 
ally such advice is quite general. We need to un- 
derstand the changes between revisions better to 
provide more specific and helpful advice. 

There has already been work on detecting cor- 
rections in sentence revisions (Xue and Hwa, 
2014; Swanson and Yamangil, 2012; Heilman 
and Smith, 2010; Rozovskaya and Roth, 2010). 
However, these works mainly focus on detecting 
changes at the level of words or phrases. Ac- 
cording to Faigley’s definition of revision change 
(Faigley and Witte, 1981), these works could help 
the identification of Surface Changes (changes 
that do not add or remove information to the orig- 
inal text). However, Text Changes (changes that 
add or remove information) will be more difficult 
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to identify if we only look at revisions within sen- 
tences. According to Hashemi and Schunn (2014), 
when instructors were presented a comparison of 
differences between papers derived from words, 
they felt the information regarding changes be- 
tween revisions was overwhelming. 

This paper proposes to look at the changes be- 
tween revisions at the level of sentences. Com- 
paring to detecting changes at the word level, de- 
tecting changes at the sentence level contains less 
information, but still keeps enough information 
to understand the authors’ intention behind their 
modifications to the text. The sentence level edits 
could then be grouped and classified into differ- 
ent types of changes. The long-term goal of this 
project is to allow us to be able to identify both 
Text Changes and Surface Changes automatically. 
Students, teachers, and researchers could then per- 
form analysis on the different types of changes and 
have a better understanding of the rewriting pro- 
cess. As a preliminary work, this paper explores 
steps toward this goal: First, automatically gener- 
ate the description of changes based on four prim- 
itives: Add, Delete, Modify, Keep-, Second, merge 
the primitives that come from the same purpose. 

2 Related work 

Hashemi and Schunn (2014) presented a tool 
to help professors su mm arize students’ changes 
across papers before and after peer review. They 
first split the original documents into sentences 
and then built on the output of Compare Suite 
(CompareSuite, 2014) to count and highlight 
changes in different colors. Figure 1 shows a 
screenshot of their work. As we can see, the mod- 
ifications to the text arc misinterpreted. Line 66 
in the final draft should correspond to line 55 and 
line 56 in the first draft, while line 67 and line 68 
should be a split of line 57 in the first draft. How- 
ever, line 67 is aligned to line 56 wrongly in their 
work. This wrong alignment caused many mis- 
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recognized modifications. According to Hashemi, 
the instructors who use the system think that the 
overwhelming information of changes make the 
system less useful. We hypothesize that since their 
work is based on analysis at the word level, al- 
though their approach might work for identifying 
differences within one sentence, it makes mistakes 
when sentence analysis is the primary concern. 

Our work avoids the above problem by detect- 
ing differences at the sentence level. Sentence 
alignment is the first step of our method; fur- 
ther inferences about revision changes arc then 
based on the alignments generated. We borrow 
ideas from the research on sentence alignment for 
monolingual corpora. Existing research usually 
focuses on the alignment from the text to its sum- 
marization or its simplification (Jing, 2002; Barzi- 
lay and Elhadad, 2003; Bott and Saggion, 2011). 
Barzilay and Elhadad (2003) treat sentence align- 
ment as a classification task. The paragraphs are 
clustered into groups, and a binary classifier is 
trained to decide whether two sentences should be 
aligned or not. Nelken (2006) further improves 
the performance by using TF*IDF score instead of 
word overlap and also utilizing global optimiza- 
tion to take sentence order information into con- 
sideration. We argue that summarization could 
be considered as a special form of revision and 
adapted Nelken’s approach to our approach. 

Edit sequences arc then inferred based on the 
results of sentence alignment. Fragments of ed- 
its that come from the same purpose will then be 
merged. Related work to our method is sentence 
clustering (Shen et al., 2011; Wang et ah, 2009). 
While sentence clustering is frying to find and 
cluster sentences si mi lar to each other, our work 
is to find a cluster of sentences in one document 
that is similar to one sentence in the other docu- 
ment after merging. 

3 Sentence-level changes across revisions 

3.1 Primitives for sentence-level changes 

Previous work in educational revision analysis 
(Faigley and Witte, 1981; Connor and Asenav- 
age, 1994) categorized revision changes to be ei- 
ther surface changes or text-based changes. With 
both categories, six kinds of changes were defined 
as shown in Table 1. 

Different from Faigley’s definition, we define 
only 4 primitives for our first step of edit sequence 
generation: Add , Delete , Modify and Keep. This 


Code 

Explanation 

Addition 

Deletion 

Substitutions 

Permutation 

Distribution 

Consolidation 

Adding a word or phrase 
Omitting a word or phrase 
exchange words with synonyms 
rearrange of words or phrases 
one segment divided into two 
combine two segments into one 


Table 1: Code Definition by F.Faigley and S. Witte 

definition is s im ilar to Bronner’s work (Bronner 
and Monz, 2012). We choose this definition be- 
cause these 4 primitives only correspond to one 
sentence at a time. Add , Delete, Modify indicates 
that the writer has added/deleted/modified a sen- 
tence. Keep means the original sentence is not 
modified. We believe Permutation, Distribution 
and Consolidation as defined by Faigley could be 
described with these four primitives, which could 
be recognized in the later merge step. 

3.2 Data and annotation 

The corpus we choose consists of paired first and 
final drafts of short papers written by undergradu- 
ates in a course “Social Implications of Comput- 
ing Technology”. Students are required to write 
papers on one topic and then revise their own pa- 
pers. The revisions are guided by other students’ 
feedback based on a grading rubric, using a web- 
based peer review system. Students first submitted 
their original paper into the system, and then were 
randomly assigned to review and comment others’ 
work according to the writing rubric. The authors 
would receive the others’ anonymous comments, 
and then could choose to revise their work based 
on others’ comments as well as their own insights 
obtained by reviewing other papers. 

The papers in the corpus contain two topics. 
In the first topic, the students discussed the role 
that Big Data played in Obama's presidential cam- 
paign. This topic contains 1 1 pairs of first and final 
drafts of short papers. We name this Cl. The other 
topic, named C2, talks about intellectual property 
and contains 10 pairs of paper drafts. The students 
involved in these two topics are from the same 
class. Students make more modifications to their 
papers in C2. More details can be seen in Table 2. 

Our revision change detection approach con- 
tains three steps: sentence alignment, edit se- 
quence generation and merge of edit sequences. 
Thus we annotated for these three steps. 
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This large amount of advertising money leads companies to no longer needing to sell their product 
to people but just bring people to their site by offering them free use of their product . 

This has thus proven Dyson 's prediction that companies would give away copyright material in 
order to attract people to their site . 

it has also been show companies will give away their products for free in order to sell ancillary 
products . 

Lets take the app Angry Birds for example ; it gave away its game for free to millions of people 
these people who liked this game then spend millions of dollars on t-shirt , stuffed animal , and 
additional game content ; the creators of angry birds made 106.3 million dollars last year off 
something they gave away for free . 


This large amount of advertising money leads companies to no longer needing to sell their product to 

65 people but just bring people to their site by offering them free use of their product . 

Dyson prediction of companies giving away copyright material in order to sell ancillary products has also 

66 come true . 

Lets take the app Angry Birds for example ; it gave away its game for free ( or for a dollar, but still well 
bellow its market value ) to millions of people , these people who liked this game then spend millions 

67 of dollars on t-shirt , stuffed animal , and additional game content . 

The creators of angry birds made 106.3 million dollars last year off something they gave away essentially 

68 for free. 


(a) first draft 


(b ) final draft 


This large amount of advertising money leads companies to no longer needing to sell their product to people but just brin 
g people to their site by offering them free use of their product . 


—prediction {that/of} companies {would/ giving} give 
tc sell ancillary {attract people to their site/products has also come true} 


g h ' is-hag ' ■ - tf rj g ' pr - evef : — Dyson J -s- 
away copyright material in orde 
{It/ Lets} take the app {has also been show companies will give/Angry Birds for example ; it gave} away {their products/it 
s game} for free {in/f} or for a dollar , but still well bellow its market value {order/J} to millions of people . these 
people who liked this game then spend millions {sell/ of} dollars on t- 
shirt . stuffed animal . and additional {ancillary products/game content} . 

{Lets/The} 


app Angry 


example , • 


gave away 


people 


people 


thro game then opend millions of dollars or: t shirt , — stuffed animal , — and additional game content ; — the 

creators of angry birds made 106.3 million dollars last year off something they gave away essentially for free_ 


(c) Revision detection using Hashemi’s approach 

Figure 1: Fragments of a paper in corpus C2 discussing intellectual property, (c) is Hashemi’s work, 
green for recognized modifications, blue for insertions and red for deletion 


For sentence alignment, each sentence in the fi- 
nal draft is assigned the index of its aligned sen- 
tence in the original draft. If a sentence is newly 
added, it will be annotated as ADD. Sentence 
alignment is not necessarily one-to-one. It can 
also be one-to-many ( Consolidation ) and many- 
to-one ( Distribution ). Table 3 shows a fragment 
of the annotation for the text shown in Figure 1. 

For edit sequences, the annotators do the anno- 
tation based on the initial draft. For the same frag- 
ment in Table 3, the annotated sequence is: Keep, 
Modify , Delete, Modify, Add 1 . 

For edit sequence merging, we further annotate 
Consolidation and Distribution based on the edit 
sequences. In our example, 66 consolidates 55 and 
56, while 57 distributes to 67 and 68. 



pairs 

#D1 

#D2 

Avgl 

Avg2 

Cl 

11 

761 

791 

22.5 

22.7 

C2 

10 

645 

733 

24.7 

24.5 


Table 2: Detailed information of corpora. #D1 and 
#D2 are the number of sentences in the first and 
final draft, Avg 1 and Avg2 arc the average number 
of words in one sentence in the first and final draft 

As a preliminary work, we only have one anno- 
tator doing all the annotations. But for the anno- 
tation of sentence alignments, we have two anno- 

'66 consolidates 55, 56; while 57 distributes to 67, 68. 
Notice that Consolidation is illustrated as Modify , Delete and 
Distribution is illustrated as Modify, Add. As the annotators 
annotate based on the first draft. Modify always appears be- 
fore Add or Delete 


tators annotating on one pair of papers. The paper 
contains 76 sentences, and the annotators only dis- 
agree in one sentence. The kappa is 0.794 2 , which 
suggests that the annotation is reliable based on 
our annotation scheme. 

4 Automatic detection of revision 
changes 

The detection of revision changes contains three 
parts: sentence alignment, edit sequence genera- 
tion and edit sequence merging. The first two parts 
generate edit sequences detected at the sentence 
level, while the third part groups edit sequences 
and classifies them into different types of changes. 
Currently the third step only covers the identifica- 
tion of Consolidation and Distribution. 


Sentence Index (Final) 

65 

66 

67 

68 

Sentence Index (First) 

54 

55,56 

57 

57 


Table 3: An example of alignment annotation 


Sentence alignment We adapted Nelken’s ap- 
proach to our problem. 

Alignment based on sentence similarity 

The alignment task goes through three stages. 

1. Data preparation: for each sentence in the an- 
notated final draft, if it is not a new sentence, cre- 
ate a sentence pair with its aligned sentence in the 

2 We calculate the Kappa value following Macken’s idea 
(Macken, 2010), where the aligned sentences are categorized 
as direct-link, while new added sentences are categorized as 
null-link (ADD). 
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first draft. The pair is considered to be an aligned 
pair. Also, randomly select another sentence from 
the first draft to make a negative sentence pair. 
Thus we ensure there arc nearly equal numbers of 
positive and negative cases in the training data. 

2. Training: according to the similarity met- 
ric defined, calculate the similarity of the sentence 
pairs. A logistic regression classifier predicting 
whether a sentence pair is aligned or not is trained 
with the similarity score as the feature. In addi- 
tion to classification, the classifier is also used to 
provide a similarity score for global alignment. 

3. Alignment: for each pair of paper drafts, con- 
struct sentence pairs using the Cartesian product 
of sentences in the first draft and sentences in the 
final. Logistic regression classifier is used to deter- 
mine whether the sentence pair is aligned or not. 

We added Levenshtein distance (LD) (Leven- 
shtein, 1966) as another similarity metric in ad- 
dition to Nelken’s metrics. Together three similar- 
ity metrics were compared: Levenshtein Distance, 
Word Overlap(WO), and TF*IDF. 

Global alignment 

Sentences are likely to preserve the same or- 
der between rewritings. Thus, sentence or- 
dering should be an important feature in sen- 
tence alignment. Nelken’s work modifies the 
Needleman-Wunsch alignment (Needleman and 
Wunsch, 1970) to find the sentence alignments and 
goes in the following steps. 

Stepl : The logistic regression classifier previ- 
ously trained assigns a probability value from 0 to 
1 for each sentence pair Use this value as 

the similarity score of sentence pair: sim(i,j). 

Stepl : Stalling from the first pair of sen- 

tences, find the best path to maximize the likeli- 
hood between sentences according to the formula 
s(i,j) = max{s(i — 1 ,j — 1) + sim(i, j), s(i — 
1, j) + sim(i,j) ,s(i,j - 1) +sim(i,j)} 

Step3 : Infer the sentence alignments by back 
tracing the matrix s(i, j). 

We found out that changing bolded parts in the 
formula to s(i,j) = max{s(i — l,j — 1) + 
sim(i,j),s(i — 1 ,j) + insertcost ,s(i,j — 1) + 
deletecost } shows better performance in our prob- 
lem. According to our experiment with Cl, insert- 
cost and deletecost are both set to 0. 1 as they are 
found to be the most effective during practice. 

Edit sequence generation This step is an inter- 
mediate step, which tries to generate the edit se- 
quence based on the sentence alignment results 


from the previous step. The edit sequences gen- 
erated would later be grouped together and clas- 
sified into different types. In our current work, a 
rule -based method is proposed for this step. 

Stepl : The index of original document i and the 
index of the modified document j both start from 
0. If sentence i in the original document is aligned 
to sentence j in the modified one, go to step 2, if 
not go to step 3. 

Step2: If the two sentences are exactly the same, 
add Keep to the edit sequence, if not, add Modify. 
Increase i and j by 1 , go to step 1 . 

Step3 : Check the predicted alignment index of 
sentence j, if the predicted index is larger than sen- 
tence i in the original document, add Delete and 
increase i by 1, otherwise, mark as Add and in- 
crease j by 1, go to step 1. 

Edit sequence merging Distribution means 
splitting one sentence into two or more sentences, 
while Consolidation means merging two or more 
sentences into one sentence. These two operations 
can be derived with primitives Modify , Add and 
Delete. They follow the following patterns: 

Consolidation-. Modify-Delete-Delete-... 

Distribution : Modify-Add-Add- ... 

These sequences both start with Modify fol- 
lowed with a repetitive number of Delete or Add. 
A group of edit sequences can be merged if they 
can be merged to a sentence close to the sentence 
in the other draft. We applied a rule-based ap- 
proach based on our observations. 

We first scan through the sequence generated 
above. Sequences with Modify-Add-... or Mod- 
ify-Delete-... arc extracted. For each sequence ex- 
tracted, if there arc n consecutive Add or Delete 
following Modify, create n groups, Groiip f i < 
n) contains sentences from the modified sentence 
to the next consecutive i sentences. For each 
group, merge all the sentences, and use the clas- 
sifier trained above to get the similarity score 
Sivrigroupi between the merged sentence and the 
original one. If there arc multiple groups classi- 
fied as aligned, choose group i that has the largest 
Sim groupi , merge the basic edit operations into 
Consolidation or Distribution. If none of the 
groups arc classified as aligned, do not merge. 

5 Evaluation 

Sentence alignment We use accuracy as the 
evaluation metric. For each pair of drafts, we 
count the number of sentences in the final draft 
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N \ . For each sentence in the final draft, we count 
the number of sentences that get the correct align- 
ment as N 2 - The accuracy of the sentence align- 
ment is 3 

We use Hashemi’s approach as the baseline. 
Compare Suite colors the differences out, as 
shown in Figure 1. We treat the green sentences 
as Modify and aligned to the original sentence. 

For our method, we tried four groups of set- 
tings. Group 1 and group 2 perform leave-one-out 
cross validation on Cl and C2 (test on one pair of 
paper drafts and train on the others). Group 3 and 
group 4 train on one corpus and test on the other. 


Group 

LD 

WO 

TF*IDF 

Baseline 

1 

0.9811 

0.9863 

0.9931 

0.9427 

2 

0.9649 

0.9593 

0.9667 

0.9011 

3 

0.9727 

0.9700 

0.9727 

0.9045 

4 

0.9860 

0.9886 

0.9798 

0.9589 


Table 4: Accuracy of our approach vs. baseline 

Table 4 shows that all our methods beat the 
baseline 4 . Among the three similarity metrics, 
TF*IDF is the most predictive. 

Edit sequence generation We use WER (Word 
Error Rate) from speech recognition for evaluat- 
ing the generated sequence by comparing the gen- 
erated sequence to the gold standard. 

WER is calculated based on edit distances be- 
tween sequences. The ratio is calculated as: 
WER = S+ n +I , where S means the number of 
modifications, D means the number of deletes, I 
means the number of inserts. 

We apply our method on the gold standard of 
sentence alignment. The generated edit sequence 
is then compared with the gold standard edit se- 
quence to calculate WER. Hashemi's approach is 
chosen as the baseline. The WER of our method is 
0.035 on Cl and 0.017 on C2, comparing to 0.091 
on Cl and 0.153 on C2 for the baseline, which 
shows that our rule -based method has promise. 

3 Notice that we have the case that one sentence is aligned 
to two sentences (i.e. Consolidation, as sentence 66 in Table 
3). In our evaluation, an alignment is considered to be ccurect 
only if the alignment covers all the sentences that should be 
covered. For example, if Sentence 66 in Table 3 is aligned to 
Sentence 55 in the first draft, it is counted as an error. 

4 For Groups 1 and 2, we calculate the accuracy of 
Hashemi’s approach under a leave-one-out setting, each time 
remove one pair of document and calculate the accuracy. A 
significance test is also conducted, the worst metric LD in 
Group 1 and WO in Group 2 both beat the baseline signifi- 
cantly ( pi = 0.025,p2 = 0.017) in two-tailed T-test. 


Applying our method on the predicted alignment 
on the first step gets 0.067 on Cl and 0.025 on C2, 
which although degraded still beats the baseline. 

Edit sequence merging There are only a limited 
number of Consolidation and Distribution exam- 
ples in our coipus. Together there are 9 Consolida- 
tion and 5 Distribution operations. In our current 
data, the number of sentences involved in these 
operations is always 2. Our rule-based method 
achieved 100% accuracy in the identification of 
these operations. It needs further work to see if 
this method would perform equally well in more 
complicated corpora. 

6 Conclusion 

This paper presents a preliminary work in the ef- 
fort of describing changes across revisions at a 
higher level than words, motivated by a long term 
goal to build educational applications to support 
revision analysis for writing. Comparing to revi- 
sion analysis based on words or phrases, our ap- 
proach is able to capture higher level revision op- 
erations. We also propose algorithms to detect re- 
vision changes automatically. Experiments show 
that our method has a reliable performance. 

Currently we are investigating applying se- 
quence merging on the automatic generated edit 
sequences based on edit distances directly. Our 
next plan is to develop a tool for comparing drafts, 
and conduct user studies to have extrinsic evalua- 
tions on whether our method would provide more 
useful information to the user. We are also plan- 
ning to do further analysis based on the revisions 
detected, and ultimately be able to distinguish be- 
tween surface changes and text-based changes. 
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